Analyzing a Simple C/C++ Application with VTune and
Microsoft Visual C++*
This is a step-by-step process to show the highlights of using VTune with a C/C++ application. The user is encouraged to
explore the VTune screens (ex: context sensitive help, display options, etc.) while going through these instructions.
These instructions reflect working with C/C++ programs using Microsofts development environment. Instructions for using
other languages and environments will be placed on the following URL as new support is defined:
http://developer.intel.com/design/perftool/vtune/prod_rev.htm
These instructions require that VTune 2.5, and Microsoft Visual C++ 5.0 are installed.
Create and Compile the XForm Application
1. Invoke Visual C++ with no active project
2. Click File->New, select Files->Text, click OK. A blank text window appears.
3. Type in the following program:
#include <windows.h>
#include <stdio.h>
#include <winbase.h>
#include <time.h>
void main(int argc, char *argv[])
{
int i, j;
double dx[3000], dy[3000];
clock_t startTime, endTime;
for (i=0; i<3000; i++)
dy[i] = dx[i] = i;
startTime = clock();
for (i=0; i<3000; i++)
for (j=0; j<9000; j++)
dy[i] += dx[i]/0.45;
endTime = clock();
printf ("\n\nProgram Completed.\n\n");
printf ("Execution time in seconds: %f\n\n",
(double) (endTime - startTime) / CLOCKS_PER_SEC);
Sleep (3000);
}
4. Click File->SaveAs, type Xform.cpp as the filename, choose an appropriate directory (ex: c:\Xform).
5. Click Build->Build Xform.exe. A dialog box is generated asking to create a default workspace. Click Yes. The Xform
application is then built. By default, Debug is turned on. VTune needs Debug information to display
source code.
6. After it compiles correctly, click Build->Execute. The Xform application runs for several seconds, and then ends.
Find Xforms Hotspot with VTune 2.5
1. Invoke VTune 2.5. If the VTune Assistant window appears, close it by unclicking the Show on Startup radio button and
click the red X. It can be re-invoked with View->Assistant.
2. Click File->NewProject. The NewProject wizard will ask some questions:
Program to test: Xform.exe with path, probably c:\Xform\Debug\Xform.exe. Click Next.
Working directory: Directory that contains Xform.exe. Click Next.
Command line parameters, Beginning and ending keystrokes: Leave Blank. Click Next.
How long for application to run: 20 seconds. Click Next.
Source Code Dir: Dir that contains Xform.cpp, probably c:\Xform. Click Next.
VTune Output Dir: Dir that contains Xform.exe. Click Finish.
3. Click View->ProjectOptions-> Click Automation tab, Click on Terminate Program When Monitoring Session ends, Click Close.
4. Click the Run->StartMonitorSession command. This runs the Xform program.
5. After several seconds when the Xform window closes, if VTune is still sampling (Session starting... appears in VTune main
menu window) click the Run->EndMonitorSession command. Several progress meters appear. VTune asks to Open the
Session. Click Yes.
6. The modules report appears. Maximize the window. This shows a system wide view of software modules that executed in
the system.
7. The modules are alphabetical from the top. Double click the Xform.exe line. The HotSpot window appears.
8. Double click on the longest red line, which is the hotspot.
9. The analysis window appears with the hotspots source code displayed. Note the line of source with the biggest time
number to the left of the line. This is the source code for Xforms hotspot.
Get Advice on Speeding Up Xform
1. Double click on the hotspot source line: dy[i] += dx[i]/s to invoke VTunes C/C++ coach. C Coach asks for Source file
information for Xform.cpp. Click Manual Entry, then OK. The coach indicates that no source options were specified. Click
OK. The coach then runs for a short time and says that both dx and dy could be moved out of the inner loop. Click question
marks next to the advice for more detail.
Close the C/C++ coach window.
At this point you may want to make the suggested changes to the source code of Xform, recompile and see how much it is
improved.
2. Click the View->MixAssemblerAndSource command. Maximize the window.
Assembly language code for the program is displayed. It is annotated with CPU performance data for the instructions.
Included are Pentium® processor pairing information (color coded on the left), Pentium processor clock cycle counts,
performance penalties, and CPU usage percentages from the profiling done earlier. Other data may be requested (Pentium II
processor micro-ops, decoder groups, etc) using the Options->ColumnDisplayOptions command.
3. VTune also gives instruction level advice. Double click the line that has the penalty FP_Dep_ST(0) on the right to learn
more about ways performance can be improved Then close the Advanced Instruction Analyzer window.
Invoke Dynamic Analysis
1. Dynamic Analysis is for engineers that want very detailed information about dynamic processor performance issues.
2. Click View->Source.
3. Click on [the first for (I=0; I<3000; I++)] near or on line 12, Click DynamicAnalysis->SetEntryPoint.
4. Click on [the second for (I=0; I<3000; I++)] near or on line 16, Click DynamicAnalysis->SetExitPoint. This defines the part of the
program that will be simulated for dynamic performance penalties, (ex: checking for cache performance issues, branch
mispredictions, etc.).
5. Click DynamicAnalysis->InvokeDynamicAnalyzer. The Dynamic Analysis setup box appears. Click Start. The program will
run for several seconds, followed by a box appearing indicating the number of instructions and clocks simulated. Click OK.
6. The Dynamic Analysis screen appears with instruction level details on performance issues. In the Penalties and Warnings
column, the data is formatted as follows:
Occurances * Penalty : ClocksPerPenalty.
For example, 750 * DCache_Comp_Miss : 3, indicates 750 data cache compulsary misses (first time an address is
referenced) with each miss causing 3 clocks of penalty for a total of 2250 clocks.
7. Double click on any of the instructions. Detailed descriptions of the penalties are displayed.
8. The count column indicates the number of times each instruction is executed. Clocks displays the percentage of CPU time
that instruction consumed.
|